Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ai): fix ai model config parsing #3173

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ad-astra-video
Copy link
Collaborator

What does this pull request do? Explain your changes. (required)

aiModels.json parsing failed to set price for model that was mixed into different pipeline/model configs. See logs below and attached aiModels.json. Note some extra log lines were added for visibility into aiCaps and the autoPrice set from the config.

vires-in-numeris pointed out this bug with the following log lines:

2024/09/17 14:10:30 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_http://localhost:9010 modelID=facebook/sam2-hiera-large
I0917 14:10:30.334105       1 db.go:368] Closing DB
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x19ff1cd]

goroutine 74 [running]:
math/big.(*Rat).IsInt(...)
        /usr/local/go/src/math/big/rat.go:401
math/big.(*Rat).FloatString(0x0, 0x3)
        /usr/local/go/src/math/big/ratconv.go:333 +0x2d
github.com/livepeer/go-livepeer/cmd/livepeer/starter.StartLivepeer({_, _}, {0xc000c84250, 0xc000c84260, 0xc000c84270, 0xc000c84280, 0xc000c84290, 0xc000c842c0, 0xc000c842a0, 0xc000c84410, ...})
        /src/cmd/livepeer/starter/starter.go:1337 +0xb03e
main.main.func1()
        /src/cmd/livepeer/livepeer.go:97 +0x59
created by main.main in goroutine 1
        /src/cmd/livepeer/livepeer.go:96 +0xbe5

I was able to reproduce the seg fault with the attached aiModels.json.

aiModels.json

livepeer-test-orchestrator-sam2-1  | I0917 23:11:54.262617       1 pricefeedwatcher.go:164] Starting PriceFeed watch loop
livepeer-test-orchestrator-sam2-1  | 2024/09/17 23:11:55 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.144057       1 starter.go:1335] +v%!(EXTRA []core.Capability=[32])
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.144114       1 starter.go:1338] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.144149       1 starter.go:1339] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/17 23:11:55 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.867284       1 starter.go:1335] +v%!(EXTRA []core.Capability=[32])
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.867352       1 starter.go:1338] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.867420       1 starter.go:1339] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/17 23:11:56 INFO Starting external container name=text-to-image_ByteDance-SDXL-Lightning_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=ByteDance/SDXL-Lightning
livepeer-test-orchestrator-sam2-1  | I0917 23:11:56.558637       1 starter.go:1335] +v%!(EXTRA []core.Capability=[32 27])
livepeer-test-orchestrator-sam2-1  | I0917 23:11:56.558696       1 starter.go:1338] Capability text-to-image (ID: 27) advertised with model constraint ByteDance/SDXL-Lightning at price 817750.957 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0917 23:11:56.558730       1 starter.go:1339] Capability text-to-image (ID: 27) advertised with model constraint ByteDance/SDXL-Lightning at price 817750.957 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/17 23:11:57 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0917 23:11:57.315927       1 starter.go:1335] +v%!(EXTRA []core.Capability=[32 27])
livepeer-test-orchestrator-sam2-1  | I0917 23:11:57.315983       1 starter.go:1338] Capability segment-anything-2 (ID: 27) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0917 23:11:57.316071       1 db.go:368] Closing DB
livepeer-test-orchestrator-sam2-1  | panic: runtime error: invalid memory address or nil pointer dereference
livepeer-test-orchestrator-sam2-1  | [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x19ff44d]
livepeer-test-orchestrator-sam2-1  |
livepeer-test-orchestrator-sam2-1  | goroutine 11 [running]:
livepeer-test-orchestrator-sam2-1  | math/big.(*Rat).IsInt(...)
livepeer-test-orchestrator-sam2-1  |    /usr/local/go/src/math/big/rat.go:401
livepeer-test-orchestrator-sam2-1  | math/big.(*Rat).FloatString(0x0, 0x3)
livepeer-test-orchestrator-sam2-1  |    /usr/local/go/src/math/big/ratconv.go:333 +0x2d
livepeer-test-orchestrator-sam2-1  | github.com/livepeer/go-livepeer/cmd/livepeer/starter.StartLivepeer({_, _}, {0xc000051600, 0xc000051610, 0xc000051620, 0xc000051630, 0xc000051640, 0xc000051670, 0xc000051650, 0xc0000517c0, ...})
livepeer-test-orchestrator-sam2-1  |    /src/cmd/livepeer/starter/starter.go:1339 +0xb2de
livepeer-test-orchestrator-sam2-1  | main.main.func1()
livepeer-test-orchestrator-sam2-1  |    /src/cmd/livepeer/livepeer.go:97 +0x59
livepeer-test-orchestrator-sam2-1  | created by main.main in goroutine 1
livepeer-test-orchestrator-sam2-1  |    /src/cmd/livepeer/livepeer.go:96 +0xbe5
livepeer-test-orchestrator-sam2-1 exited with code 2

Specific updates (required)

  • update cmd/livepeer/starter/starter.go to track the current config block capability and get the price for the correct capability/model_id for the config block
  • add a nil check to the GetBasePriceForCap

How did you test each of these updates (required)

Re-built docker container and ran with same aiModels.json. Orchestrator node starts up.

livepeer-test-orchestrator-sam2-1  | I0918 00:00:32.331950       1 pricefeedwatcher.go:164] Starting PriceFeed watch loop
livepeer-test-orchestrator-sam2-1  | 2024/09/18 00:00:33 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0918 00:00:33.286424       1 starter.go:1347] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10312544.441 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/18 00:00:33 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0918 00:00:33.974178       1 starter.go:1347] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10312544.441 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/18 00:00:34 INFO Starting external container name=text-to-image_ByteDance-SDXL-Lightning_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=ByteDance/SDXL-Lightning
livepeer-test-orchestrator-sam2-1  | I0918 00:00:34.641733       1 starter.go:1347] Capability text-to-image (ID: 27) advertised with model constraint ByteDance/SDXL-Lightning at price 814476.917 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/18 00:00:35 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.380959       1 starter.go:1347] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10312544.441 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.381782       1 starter.go:1621] ***Livepeer Running in Orchestrator Mode***
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.381836       1 starter.go:1631] Livepeer Node version: 0.7.8-ai.2-5b91100d-dirty
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.381888       1 mediaserver.go:204] Transcode Job Type: [{P240p30fps4x3 600k 30 0 320x240 4:3 0 0 0s 0 0 0 0} {P360p30fps16x9 1200k 30 0 640x360 16:9 0 0 0s 0 0 0 0}]
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.382027       1 webserver.go:20] CLI server listening on 127.0.0.1:7777
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.390879       1 cert.go:83] Private key and cert not found. Generating
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.395048       1 cert.go:22] Generating cert for 127.0.0.1
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.402253       1 rpc.go:220] Listening for RPC on :8888
livepeer-test-orchestrator-sam2-1  | I0918 00:00:37.383588       1 rpc.go:305] Connecting RPC to uri=https://127.0.0.1:8888
livepeer-test-orchestrator-sam2-1  | I0918 00:00:37.388396       1 rpc.go:258] Received Ping request
livepeer-test-orchestrator-sam2-1  | I0918 00:00:52.194808       1 block_watcher.go:454] Polling blocks from=254618661 to=254618741

Does this pull request close any open issues?

Checklist:

@github-actions github-actions bot added the AI Issues and PR related to the AI-video branch. label Sep 18, 2024
@leszko leszko deleted the branch livepeer:master November 7, 2024 08:26
@leszko leszko closed this Nov 7, 2024
@rickstaa rickstaa reopened this Nov 13, 2024
@rickstaa rickstaa changed the base branch from ai-video to master November 13, 2024 21:53
@rickstaa
Copy link
Member

@ad-astra-video is this still a problem with the refactors that were included in the AI remote worker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Issues and PR related to the AI-video branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants